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■ When analyzing microarray data, hierarchical models are often 

used to share information across genes when estimating means and 
variances or identifying differential expression. Many methods utilize 
some form of the two-level hierarchical model structure suggested 
by Kendziorski et al. [Stat. Med. (2003) 22 3899-3914] in which the 
first level describes the distribution of latent mean expression levels 
among genes and among differentially expressed treatments within 
a gene. The second level describes the conditional distribution, given 
a latent mean, of repeated observations for a single gene and treat- 
ment. Many of these models, including those used in Kendziorski's 
et al. [Stat. Med. (2003) 22 3899-3914] EBarrays package, assume 
that expression level changes due to treatment effects have the same 
distribution as expression level changes from gene to gene. We present 
empirical evidence that this assumption is often inadequate and pro- 
pose three-level hierarchical models as extensions to the two-level 
log-normal based EBarrays models to address this inadequacy. We 
demonstrate that use of our three-level models dramatically changes 
analysis results for a variety of microarray data sets and verify the va- 
lidity and improved performance of our suggested method in a series 
of simulation studies. We also illustrate the importance of account- 
ing for the uncertainty of gene-specific error variance estimates when 
using hierarchical models to identify differentially expressed genes. 

1. Introduction. There are many analytic methods for microarray data 
that utilize a hierarchical model to share information across genes when 
estimating mean expression levels. A large subset of these methods model 
differences in expression levels from gene to gene and differences in expres- 
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sion levels caused by treatment effects with a single distribution. Canon- 
ical examples of such methods are implemented in the EBarrays package 
for R developed by Kendziorski et al. (2003). This work has been influ- 
ential as indicated by a variety of recent methods that cite Kendziorski 
et al. (2003) and follow their modeling strategy. Examples include Newton 
et al. (2004), Yuan and Kendziorski (2006a, 2006b), Yuan (2006), Lo and 
Gottardo (2007), Keles, (2007), Wei and Li (2007, 2008), Wu et al. (2007), 
Jensen et al. (2009), and Rossell (2009). 

The analytic methods provided in EBarrays are based on two-level hierar- 
chical parametric models that can be used to analyze data from experiments 
with more than two treatment groups and produce posterior expression pat- 
tern probabilities, which can be used to assess the significance of and classify 
differential expression of genes. The first level of the hierarchical model de- 
scribes the distribution of latent mean expression levels among genes and 
among differentially expressed (DE) treatments within a gene. The second 
level describes the conditional distribution, given a latent mean, of repeated 
observations for a single gene and treatment. 

A necessary user input to models like those included in EBarrays is a list 
of possible expression patterns. In a two-treatment experiment, the only two 
expression patterns are equivalent expression and differential expression. In 
general, each pattern describes how to partition the experimental units into 
groups based on the experimental conditions or treatments associated with 
the experimental units. An analysis based on these models can yield a gene- 
specific posterior probability estimate for each pattern. 

The application of hierarchical models to microarray data has many ben- 
efits: "sharing" information across genes compensates for having few repli- 
cates, users may define expression patterns of interest involving two or more 
experimental conditions, posterior probabilities assigned to expression pat- 
terns are easy to interpret and allow for easy classification or ranking, and 
simultaneous analysis of all genes in a data set greatly reduces the dimension- 
ality of the inference problem. While the work of Kendziorski et al. (2003) 
lays a foundation for a powerful method of microarray analysis upon which 
many methods have been developed, there is room to relax assumptions and 
to improve the models described. 

The main point of this paper concerns the assumption — implied by the 
modeling strategy of Kendziorski et al. (2003) — that expression changes 
across genes have the same distribution as expression changes caused by 
treatment effects. This assumption is convenient for computational reasons 
but has undesirable consequences. In particular, if expression differences 
from gene to gene tend to be larger than treatment effects, the power to 
identify differentially expressed genes will be reduced. Based on our expe- 
rience with microarray data, we see no reason to believe that expression 
differences across genes have the same distribution as expression differences 
caused by treatment effects in all experiments. Thus, we propose to relax this 
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assumption by adding an additional level to the hierarchy of Kendziorski's 
et al. (2003) lognormal-normal (LNN) model. This creates a three-level hier- 
archical model that we will call the lognormal-normal-normal (LN 3 ) model. 

A secondary point of this paper concerns the assumption of a constant 
coefficient of variation used in Kendziorski's et al. (2003) gamma-gamma 
(GG) and lognormal-normal (LNN) models, which, for the latter model, 
implies an error variance of log expression values that is common to all 
genes. This assumption is now widely regarded as untenable. To address 
this, Lo and Gottardo (2007) introduced a method to relax the assumption 
of the GG and LNN models, and many methods to estimate gene-specific 
error variances for microarray data have been developed. [See, e.g., Baldi 
and Long (2001), Lonnstedt and Speed (2002), Wright and Simon (2003), 
Cui et al. (2005).] Kendziorski's et al. (2003) EBarrays package includes 
the LNN-moderated variance (LNNMV) method, which uses shrunken point 
estimates of gene-specific error variances similar to those described by Smyth 
(2004). We briefly demonstrate that using point estimates of gene-specific 
error variances without accounting for their uncertainty produces liberal 
posterior pattern probability estimates, which causes underestimation of the 
proportion of false positives on a list of significant genes. We propose a simple 
adaptation to the LNNMV method to account for the uncertainty in gene- 
specific variance estimates and demonstrate this corrects the liberal bias in 
the estimated expression pattern posterior probabilities. Finally, we combine 
our proposed three-level hierarchical modeling strategy with gene-specific 
error variance modeling to obtain a more general model denoted LN 3 MV. 

We formally describe the four lognormal based models (LNN, LNNMV, 
LN 3 , and LN 3 MV) and corresponding analytic methods in Section 2. In 
Section 3 we present empirical evidence from two example microarray data 
sets that clearly supports our proposed three-level hierarchical modeling 
strategy. In Section 4 we demonstrate the practical impact of our suggested 
adaptations by analyzing data from the two microarray experiments with 
several methods. Section 5 describes a variety of simulation studies used to 
verify the validity and improved power of our suggested methods. For both 
real and simulated data sets, the use of our proposed three-level hierarchal 
model dramatically increases power to detect DE genes. 

2. Model descriptions. Throughout this paper, we will use the term 
"group" to denote a set of equivalently expressed (EE) observations from 
a single gene. Consider a microarray data set with expression values for J 
genes from each of / experimental units divided among 2 experimental condi- 
tions. If for gene j there is no difference between the expression distributions 
for experimental units under conditions 1 and 2, then the entire set of / ob- 
servations forms a single group. If for gene j there is a difference between 
the expression distributions for experimental units under conditions 1 and 2, 
then the set of observations from experimental units under condition 1 forms 
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one group and the set of observations from experimental units under condi- 
tion 2 forms a second group. In general, there is at least one group for every 
gene and at most one group for every combination of gene and experimental 
condition. 

Throughout this section, we will use G p (i) to denote the group (subset 
of EE observations) to which the ith experimental unit belongs under the 
pth expression pattern. For example, suppose there is an experiment with 6 
experimental units distributed across 3 treatment groups labeled control, A, 
and B. If a researcher aims to compare each of treatments A and B to 
the control, then expression patterns of interest for each gene are p = 1: 
control = A, control = B; p = 2: control ^ A, control = B; p = 3: control = A, 
control ^ B; and p = 4: control ^ A, control ^ B. If experimental units 1 
and 2 received the control treatment, 3 and 4 received treatment A, and 5 
and 6 received treatment B; then G\(i) = 1 for i = 1, ... ,6; G*2(i) = 1 for 
i = 1, 2, 5, 6 and 2 for i = 3, 4; Gs(i) = 1 for i = 1, 2, 3, 4 and 2 for i = 5, 6; 
Gi(i) = i/2 rounded up to the nearest integer for all i. We will use P to 
denote the number of expression patterns of interest and n p to denote the 
number of groups under expression pattern p. In the example above, P = 4, 
n\ = 1, n,2 = ri3 = 2, and = 3. 

In each model, the marginal density for yj = (yji,yj2, ■ ■ ■ ,Ujl)', the vec- 
tor of observations from the jth gene for / experimental units, is given by 
f(yj\0,Tv) = J2p=i 7r pfp(yj\0)> where 7r = (-7Q, 7r 2 , . . . , tt p )', tt p is the proba- 
bility that a gene follows expression pattern p, 6 is a vector of hyperparam- 
eters for the given model, and f p (yj\0) is the density of under pattern p 
according to the given model. The marginal likelihood of the entire data set 
is given by n/=i /(y^ tt) , since observations between genes are considered 
independent under each of the discussed models. The posterior probability 
gene j follows expression pattern p given y 7 - is . 

For each model, estimates of 7r and 6 that maximize the marginal likeli- 
hood can be obtained using the EM algorithm, treating expression pattern 
as the unknown variable. When used, gene-specific error variances are es- 
timated and treated as fixed before using the EM algorithm to estimate 
other model parameters. Marginal densities and posterior probabilities are 
estimated by treating parameter estimates as the true parameter values in 
the formulas above. 

In the following subsections, we formally define four models and seven 
methods of analysis. The distinguishing features of the seven methods are 
summarized in Table 1 for future reference. 

2.1. The lognormal-normal model. The LNN model for the log scale ob- 
servation for the jth gene from the ith experimental unit under expression 
pattern p can be written as 
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Table 1 

Legend for method and model acronyms 



Relies on distinct 







modeling strategies for 


Uses 


Accounts for 






differences across genes 


gene- specific 


uncertainty in 






and differences across 


error variance 


error variance 


Method 


Model 


DE treatments 


estimates 


estimators 


LNN 


LNN 








LNNMV 


LNNMV 




/ 




LNNMV* 


LNNMV 




/ 




LNNGV 


LNNMV 




✓ 


✓ 


LN 3 


LN 3 


/ 






LN 3 MV* 


LN 3 MV 


V 


/ 




LN 3 GV 


LN 3 MV 


/ 


/ 





The methods with acronyms ending in MV* use point estimates of error variances that 
account for the degrees of freedom used when estimating treatment means (see Section 2.3). 



where 

Tji, . . . , T jnp iV(0, a 2 ) and Eji, . . . , Eji l '~ ' N(0, a 2 ). 

In this expression, /i represents the average expression of all genes and 
groups, TjG p (i) represents a random group effect for observations from the 
G p (i)th group (under pattern p) in the jth gene, and Eji represents a random 
error. 

Under this model, f p (yj\0) is the density from a multivariate normal dis- 
tribution with mean vector (//,..., //)' and pattern specific covariance matrix 
S p = a 2 \ + a 2 M p , where I is the identity matrix and M p is a symmetric ma- 
trix with element [i,j] = 1 if experimental units i and j are in the same 
group under pattern p and = if experimental units i and j are in 
different groups. This model has hyperparameters 6 = (/i, a 2 , a 2 .). 

2.2. The lognormal-normal-normal model. To explicitly model gene ef- 
fects separately from treatment effects, we propose a three-level hierarchical 
model, which we denote LN 3 . Under the LN 3 model, the log scale obser- 
vation from the jth gene and the ith experimental unit under expression 
pattern p is modeled as 

Vji = V + Ij + T jG p (i) + 

where 

7 /^ d - N(0,o*), T n ,...,T jnp U ~N(0,a 2 T ) and 
£j 1 ,...,e jI 1 ~' N(0,a 2 ). 
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In this expression, /i represents the average expression of all genes and 
groups, 7j represents a random gene effect for the jth gene, t^q u\ repre- 
sents a random group effect for observations from the G p (i)th group (under 
pattern p) in the jth gene, and Eji represents a random error. Under expres- 
sion pattern p, the density for the vector of log-scale observations for the 
jth gene, f p (yj\0), is evaluated according to a multivariate normal distri- 
bution with mean vector and pattern specific covariance matrix 
S p = cr 2 I + a 2 J + o~ 2 M p , where I is the identity matrix, J is a matrix of l's, 
and 



M p [i,j] 



1, UG p (i) = G p (j), 
0, otherwise. 



This model has hyperparameters 9 = (fj,,a 2 ,a 2 ,a 2 ) and is a generalization 
of the LNN model. That is, the LNN model is a special case of the LN 3 
model in which a 2 = 0. 

2.3. The lognormal-normal model with gene-specific error variances. The 
LNN model assumes that all genes have a common error variance, cr 2 . This 
assumption can be relaxed to allow each gene to have a unique error vari- 
ance, cr?, forming the LNNMV model. We consider three methods based on 
this model, including EBarrays' LNNMV. 

Under this model, the log scale observation for the jth gene from the ith 
experimental unit under expression pattern p can be written as 

yji = V + r jGp (i) +Eji, 

where 

Tji, . . . , T jnp 1- ~ ' N(0, a 2 ) and . . . , £ji ''~ ' N(0, a 2 ). 

This model has hyperparameters = (/x, a 2 ,cr 2 ), where a 2 = (a 2 , a 2 , . . . , a 2 ). 

The LNNMV method from EBarrays places a scaled inverse chi-squared 
distribution on the gene-specific error variances. That is, a 2 ~ inv-% 2 (df = u, 

scaling = <£), such that v^ja 2 - ~ xt- Given estimates v and <£, the gene- 

, , .9 0&+(I-T)S? , „ 9 . . 

specihc error variances are estimated by a* = — - +/ _ 2 2 , where bj is the 
ordinary sample variance estimator with (/ — T) degrees of freedom for the 
log-scale observations from the jth gene and T is total number of experi- 
mental conditions. 

The denominator of the LNNMV point estimator for a 2 does not account 
for degrees of freedom used when estimating treatment means for each gene 
in the computation of S 2 . Similar to MLEs for a 2 in a traditional ANOVA 
analysis, this estimator systematically underestimates cr 2 , resulting in lib- 
eral detection of differential expression. If one were to use a point estimator 
for cr 2 , we would recommend the less liberal approach of using the pos- 
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terior expectation a 2 = E(a 2 \S 2 ) = - + ^_ T ^_ 2 J . We denote this approach 
as LNNMV*; however, this adjusted denominator does not provide a fully 
adequate solution. 

The EBarrays methods estimate the posterior probability that gene j 
follows expression pattern p given y,- as ^ p ^ p ^ y ^^ assuming all hyper- 

parameter estimates are the true hyperparameter values. This expression is 
clearly sensitive to 0. Given that \i and a 2 are assumed to be the same for all 
genes and there are typically thousands of genes in a microarray data set, the 
effective sample size for estimating these parameters is high so that there will 
generally be little uncertainty associated with the ML estimates fi and a 2 
obtained from the EM algorithm. Therefore, it may be reasonable to act as if 
fi = fi and a 2 = °r when estimating posterior pattern probabilities. Similarly, 
it may also be reasonable to ignore uncertainty in the estimator of a 2 under 
the LNN and LN 3 models. However, when cr| is allowed to vary from gene 
to gene, there will be nonnegligible uncertainty in the corresponding estima- 
tors a 2 , which is not taken into account by assuming a 2 = a 2 . Under a model 
allowing for gene-specific error variances, a better estimator of the posterior 

probability that gene j follows expression pattern p is =pMt^4^tt4^ 

Ep=i7Tp/p(y>^>,*) 

where / p (y,|/i, a 2 , u, $) = / / p (yj|£, a 2 )f(a 2 \i>, $) da 2 , where f(o)\i>,$) 
is the empirically estimated inverse chi-squared prior distribution for <rj. 

Our suggested approach is to estimate v and <I> using the method described 
by Smyth (2004) and compute shrunken estimates <t| = E(a)\Sj) to use 
when fitting the EM algorithm to obtain estimates for /i,cr 2 , and 7r. Then 
when estimating the posterior expression pattern probabilities for each gene, 
we suggest empirically approximating f p {yj\fi, a 2 ., u, <&) as Y,q=i fp&j IA, &ti 
a *q)/Q where a* 2 is the q/(Q + l)th quantile of /(^T||^>,^ , ) and Q is a rea- 
sonably large number like 1000. We denote this method as LNNGV, which 
has hyperparameters 9 = (/i, a 2 , v, $). The effectiveness and impact of this 
suggestion are examined in Sections 4 and 5. 

2.4. The lognormal-normal-normal model with gene-specific error vari- 
ances. As with the LNN model, the LN 3 model assumes that all genes have 
a common error variance, a 2 , and this assumption can be relaxed to form 
the LN 3 MV model, which allows for gene-specific error variances. For the 
LN 3 MV model, we consider two methods, denoted LN 3 MV* and LN 3 GV, 
which incorporate gene-specific error variances in exactly the same way as 
the LNNMV* and LNNGV methods, respectively. The LN 3 MV* (LN 3 GV) 
method is a generalization of the LNNMV* (LNNGV) method. That is, the 
LNNMV* (LNNGV) method is a special case of the LN 3 MV* (LN 3 GV) 
method in which a 2 = 0. 
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3. Evidence supporting need for three-level hierarchical models. Obser- 
vations from a common gene are correlated for many reasons, even across 
differentially expressed treatments. Variability from gene to gene in several 
factors contributes to such correlation, including binding affinity of probe 
sets [Binder et al. (2004)], amount of florescent dye that binds to each cDNA 
fragment [Binder et al. (2004)], RNA transcription and degradation rate 
[Selinger et al. (2003)], and the function of genes' corresponding proteins. 
These considerations imply that models for microarray data should contain 
gene effects like those present in the LN 3 and LN 3 MV models but omitted 
from the models of Kendziorski et al. (2003). 

The theoretical impact of gene effects when detecting DE genes can be 
demonstrated by comparing the modeled variance of differences between 
pairs of observations in two scenarios. The first scenario is when the obser- 
vations in a pair come from different groups in a common gene. The sec- 
ond scenario is when the observations in a pair come from different genes. 
Under the LNN model, the variance of the difference for both scenarios 
is 2(a 2 + a 2 ). That is, the LNN model expects differences among same- 
gene observations from differentially expressed groups to "look like" differ- 
ences among observations from different genes. However, when a gene effect 
is present, the variance for differences between observations from different 
genes is 2(a 2 + a 2 + a 2 ), which is greater than the variance for differences 
between observations from different groups in a common gene, 2(a 2 + a 2 ). 
In this case, the LNN model expects within-gene differences due to differ- 
ential expression to be more extreme than they actually are, which reduces 
the model's power to detect differential expression. Creating a three-level 
hierarchical model by adding normally distributed gene effects is a tractable 
and effective method to correct this shortcoming. A similar argument can be 
made when considering models that accommodate gene-specific error vari- 
ances. 

If information about DE groups for each gene were known for real mi- 
croarray data, we could check for evidence of gene effects by comparing the 
variance of between-gene differences to the variance of within-gene differ- 
ences across DE groups. Because information about DE groups is unknown, 
such a simple strategy is not possible. However, we can fit three-level mod- 
els to actual microarray data and examine the resulting estimates of a 2 . 
Because the two-level models are special cases of three-level models with 
a 2 = 0, estimates of a 2 far from provide evidence in favor of our proposed 
three-level hierarchy over the two-level hierarchy. The next section presents 
results of two example microarray experiments where the estimates of a 2 
provide clear support for the three-level hierarchy. We describe this point in 
detail in the supplementary material [Lund and Nettleton (2012)]. 

As additional evidence of the inadequacy of models that omit gene effects, 
we compare the correlation structure implied by the LNN model to the 
correlation structure present in actual microarray data. 
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Table 2 

Empirical evidence for presence of gene effects 



Dataset (conditions) 






- 2 

7TEE<T T 


Average across condition cov 


DC3000 (NaCl, ctrl) 


0.716 


0.952 


0.681 


0.903 


DC3000 (phen, ctrl) 


0.693 


0.977 


0.677 


0.910 


DC3000 (PEG, ctrl) 


0.352 


0.914 


0.322 


0.838 


DC3000 (H 2 2 , ctrl) 


0.961 


0.957 


0.920 


0.948 


Mouse (Ch, FF) 


0.874 


0.281 


0.245 


0.280 


Mouse (Ch, MP) 


0.824 


0.281 


0.231 


0.279 


Mouse (FF, MP) 


0.956 


0.284 


0.272 


0.284 



Under the LNN model, 



c<yv{yji,yjii) = 




if yji and yjy are EE, 
otherwise. 



For any two experimental units, under the LNN model, ^ J=1 cov(7/jj, 

J = tteeCMO "^) where 7Tee(m') is the proportion of genes that are EE 
between experimental units i and i'. If experimental units i and i' correspond 
to the same experimental condition, an unbiased estimator of is given by 

= J2j=i(yji ~ y-i)(yji' - V-i')/{.J - I), because 7Tee («,«') = 1 in tnis 
case. It follows that is also an unbiased estimator of cr^, where is the 
average of a^(i,i') over all pairs of experimental units (i, i') such that the 
experimental condition associated with experimental units i and i! is the 
same. 

In practice, given an estimate 7Tee(m')> observed covariances between ex- 
perimental units associated with different experimental conditions are often 
much larger than 7Tee(2, O^t- Table 2 summarizes this phenomenon for vari- 
ous treatment comparisons within two separate microarray data sets, which 
are described in Section 4. Each data set was analyzed with the LIMMA 
package for R developed by Smyth (2004). Estimates of 7tee(M') were ob- 
tained by applying the method of Nettleton et al. (2006) to the distribution 
of p-values for each pairwise comparison. The final column provides esti- 
mates of between-treatment covariances, which were computed as the aver- 
age of all the pairwise covariances involving one experimental unit from each 
of the two treatments. The LNN and LNNMV models imply the observed 
between-treatment covariances should closely match 7TEE<7r > but Table 2 
shows that the estimated between-treatment covariances were larger than 
7Tee<7t for every treatment comparison. 

The additional covariance observed between experimental units from dif- 
ferent experimental conditions is easily explained by the presence of gene 
effects. For any two experimental units, under the LN 3 model, Y2j=i cov (2/ji> 
yji') /J = <r~ + ttee^^'Vt rather than 7Tee(z, i')&T- 
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4. Data analysis. 

4.1. Data set descriptions. We analyzed a NimbleGen mRNA data set 
of 5608 genes from the DC3000 strain of the bacterial plant pathogen Pseu- 
domonas syringae, resulting from an unpublished experiment conducted in 
the Department of Plant Pathology at Iowa State University. NimbleGen 
performed RMA normalization on the data [Irizarry et al. (2003)]. The ex- 
periment had two biological replicate samples grown in each of five different 
media: control (ctrl), phenol (phen), sodium chloride (NaCl), polyethylene 
glycol MW8000 (PEG), and hydrogen peroxide (H202). Before analyzing 
the data, the primary investigator suggested that any two noncontrol media 
will be EE only when they are also EE with the control, which reduces the 
number of expression patterns included in the analysis. Because each of the 
four treatments can be either EE or DE with the control, there are 2 4 = 16 
different expression patterns to consider. 

The second data set we analyzed is a subset of the data used in Somel 
et al. (2008), available at the Gene Expression Omnibus (GEO) website as 
GDS3221. This experiment examined the impact of diet on the expression 
of 45,101 genes in mice. We analyzed data from nine Affymetrix GeneChips 
corresponding to three treatment groups of three mice each. Each treatment 
involved ad libidum feeding of one of the following diets: (1) vegetables, 
fruit, and yogurt identical to the diet fed to chimpanzees in their ape facility 
(Ch); (2) McDonald's fast food (FF); (3) mouse pellets on which the mice 
were raised (MP). To keep the presentation simple, we have omitted data 
from a second batch of chips and a fourth diet group (cafeteria food), which 
produced expression profiles very similar to those from the McDonald's diet. 
With the three included treatment groups, there are a total of five possible 
expression patterns: Ch = FF = MP; Ch = FF + MP; Ch / FF = MP; Ch = 
MP / FF; and Ch / FF, Ch / MP, FF ^ MP. 

4.2. Analysis of real data. We analyzed these data sets with each of 
the eight methods and report the resulting parameter estimates from the 
GG, LNN, LNNGV, LN 3 , and LN 3 GV methods in Table 3. [The LNNMV*, 
LNNMV, and LN 3 MV* methods share theoretical models (and thus param- 
eter estimates) with the LNNGV, LNNGV, and LN 3 GV methods, resp.] The 
parameter estimates in Table 3 are consistent with what we expected. For 
both data sets, when a random gene effect is accounted for in the model, the 
estimated treatment effect variance decreases drastically and the gene effect 
variance is estimated to be much larger than the treatment effect variance. 
This means the LN 3 and LN 3 GV methods are able to detect smaller treat- 
ment effects than their respective two-level counterparts, LNN and LNNGV. 
It is not surprising then to see that for both data sets the LNN method esti- 
mates a larger proportion of genes following the null pattern than does the 
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Table 3 

Hyperparameter estimates and estimated proportion of null genes for DC3000 (top) 
and mouse diet (bottom) data from each of the models 



Model used to analyze 



Parameter 


GG 


LNN 


LNNGV 


LN 3 


LN 3 GV 


ft 


69.8 












1.54 










£>* 


0.0254 










/' 




0.501 


0.419 


0.277 


0.264 


*? 




0.982 


0.878 


0.151 


0.101 


•2 

(Try 








0.813 


0.832 


a 2 




0.0129 




0.0116 




4> 






0.00509 




0.00509 


V 






3.546 




3.546 




0.728 


0.721 


0.657 


0.655 


0.492 


a 


269.5 










&0 


4.59 










V* 


0.0187 










A 




0.206 


0.210 


0.194 


0.194 






0.279 


0.281 


0.00468 


0.00678 


«3 








0.278 


0.275 


a 2 




0.00346 




0.00331 




■£ 






0.00249 




0.00249 


;> 






8.186 




8.186 


r^null 


0.958 


0.954 


0.931 


0.802 


0.840 



LN 3 method, or that the LNNGV method estimates a larger proportion of 
genes following the null pattern than does the LN 3 GV method. 

Rather than examining parameter estimates, researchers are often more 
interested in creating lists of genes that are likely to follow expression pat- 
terns of interest. To construct a list of DE genes, one would collect all genes 
with an estimated posterior probability of equivalent expression (ePPEE) 
less than a given threshold. When the ePPEE falls below the given thresh- 
old for many genes, not all identified potentially DE genes may be indi- 
vidually studied further. However, the size and contents of the entire list 
provide important information to researchers about the global effects of the 
treatments on gene expression. The composition of a long list of potentially 
DE genes forms the basis for popular gene set enrichment analyses that are 
commonly used to interpret the results of microarray experiments. To ex- 
amine the practical differences between gene lists created by the methods, 
we begin by plotting the empirical CDF of the ePPEEs for each method for 
the two data sets in Figure 1. These plots quickly provide the observed size 
of a gene list for any PPEE cutoff, obtained by intersecting a vertical line 
at the desired PPEE cutoff with the curve for each method. 
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DC3000 




0.0 0.2 0.4 0.6 0.8 1.0 
X 

Mouse Diet 




0.0 0.2 0.4 0.6 0.8 1.0 
X 



Fig. 1. Comparison across methods of empirical ePPEE CDFs for DC3000 (top) and 
mouse diet (bottom) data. 



The plots show substantial differences between the examined methods 
in the number of detected genes over a wide range of PPEE thresholds. 
For models with gene-specific error variances, incorporating uncertainty in 
estimated error variances greatly reduced the number of detected genes 
(LNNGV and LN 3 GV curves are lower than LNNMV* and LN 3 MV* curves, 
resp.). In the DC3000 data at a PPEE cutoff of 0.1, for example, the 
LNNMV, LNNMV*, and LNNGV methods would produce lists with 1983, 
1498, and 893 genes, respectively. Incorporating gene effects greatly in- 
creased the number of detected genes (LN 3 , LN 3 MV*, and LN 3 GV curves 
are higher than LNN, LNNMV*, and LNNGV curves, resp.). In the mouse 
diet data at a PPEE cutoff of 0.1, for example, the LN 3 GV method identified 
almost three times as many DE genes as the LNNGV method (945 vs. 324 
genes, resp.). These results indicate that differences between the methods' 
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Table 4 

Overlap in lists of top 200 most significant DE genes for DC3000 (top) and mouse diet 

(bottom) data 



Method 


1 


2 


3 


4 


5 


6 


7 


(1) GG 


200 














(2) LNN 


187 


200 












(3) LNNMV 


122 


119 


200 










(4) LNNMV* 


118 


120 


160 


200 








(5) LNNGV 


130 


127 


185 


162 


200 






(6) LN 3 


186 


198 


117 


118 


125 


200 




(7) LN 3 MV* 


117 


114 


194 


154 


184 


113 


200 


(8) LN 3 GV 


77 


81 


137 


149 


133 


79 


135 


(1) GG 


200 














(2) LNN 


193 


200 












(3) LNNMV 


108 


107 


200 










(4) LNNMV* 


125 


124 


152 


200 








(5) LNNGV 


88 


87 


173 


136 


200 






(6) LN 3 


193 


197 


109 


124 


89 


200 




(7) LN 3 MV* 


93 


92 


181 


134 


184 


94 


200 


(8) LN 3 GV 


83 


82 


155 


148 


158 


82 


148 



ePPEEs are practically significant, and care should be taken when choosing 
among the suggested methods. 

Constraints on time, money, material, and personnel resources limit the 
number of genes that researchers will follow up on with further study. Thus, 
the overlap between lists from each method containing a fixed number of the 
most significant genes is an important feature for assessing the similarity be- 
tween methods' results. Table 4 provides the size of pairwise intersections 
of lists containing the 200 most significant genes from each method for the 
DC3000 and mouse diet data sets, respectively. These results show substan- 
tial practical differences between rankings, as many lists overlap by roughly 
half their genes and most lists overlap by fewer than 150 genes. 

5. Simulation study. Here we briefly summarize our simulation study 
and its results. Detailed accounts of simulation procedures and results are 
presented in the supplementary material [Lund and Nettleton (2012)]. 

We conduct a variety of simulations to assess the accuracy and power of 
the considered methods. By "accuracy," we refer to the property that for any 
given collection of genes the average estimated posterior probability for each 
pattern should closely match the proportion of genes in the collection that 
follow the given pattern. By "power," we refer to a method's ability to detect 
differential expression. We prefer the method that creates the largest list of 
genes for a given ePPEE threshold, provided that the method's ePPEEs are 
accurate. 
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We simulated data from each of the five models (GG, LNN, LNNMV, LN 3 , 
and LN 3 MV) using the model parameters reported for the DC3000 data set 
in Table 3. In addition to these model-based simulations, we also conducted 
simulations using an HIV mRNA expression data set from the GEO website, 
named GDS1449. We analyzed each simulated data set with each method 
and recorded estimated posterior probabilities for each expression pattern 
for each gene. 

The simulation results clearly support our claims that failing to distinctly 
model gene and gene-specific treatment effects reduces power and produces 
conservative results and that using point estimates of error variances pro- 
duces liberal results. The LN 3 GV method stands out as the best method 
from these simulations. The LN 3 GV method was the only method to produce 
accurate ePPEEs in all simulation scenarios, and no method produced bet- 
ter average significance rankings (as seen in ROC curves) than those of the 
LN 3 GV method in any simulation scenario. The LN 3 GV method exhibited 
greater power than the LNNGV method, which was the only other method 
that did not produce liberal results in at least one simulation scenario. 

6. Discussion. When modeling a data set that includes multiple obser- 
vations from each of multiple genes, a conventional analysis would begin 
with a model that incorporates gene effects. One might decide to omit gene 
effects if, after looking, there was no evidence of gene effects or if results 
from an analysis were not affected by the omission of gene effects. We have 
demonstrated that gene effects are present in real data sets and provided 
generalizations of the methods based on lognormal two-level hierarchical 
models to include gene effects. These generalizations behave nearly iden- 
tically to their two-level counterparts when analyzing data without gene 
effects and improve power and accuracy when data contain gene effects. 
These extensions serve as an example of how other hierarchical models that 
omit gene effects might be improved by more versatile modeling. 

Using point estimates of gene-specific error variances without accounting 
for their uncertainty produces liberal ePPEEs. We have suggested a cor- 
rected approach that involves integration over an empirically estimated prior 
distribution for the error variances and demonstrated this adaptation yields 
accurate ePPEEs. 

As noted in the Introduction, we have identified nearly a dozen meth- 
ods that omit gene effects. There are far more methods in the microarray 
data analysis literature that do not suffer from this problem. Most pub- 
lished methods explicitly or implicitly include gene effects whose distribu- 
tion is allowed to differ from the distribution of within gene treatment ef- 
fects. Methods based on gene-specific linear models that make no attempt 
to borrow information across genes fall into this category, as do methods 
that borrow information across genes only for the purpose of improved er- 



MODELING GENE AND GENE-SPECIFIC TREATMENT EFFECTS 



15 



ror variance estimation. While we expect our LN 3 GV method to perform 
well when compared against the large collection of competing approaches, 
a broad comparison of methods is beyond the scope of this paper, and we 
make no claims of superiority here. Our main point is that the hierarchical 
modeling approach pioneered by Kendziorski et al. (2003) can be improved 
by the inclusion of both gene and gene-specific treatment effects. Given the 
influential nature of the original work of Kendziorski et al. (2003), we think 
this is an important point to make. 

The development of the LNN and GG models by Kendziorski et al. (2003) 
represents groundbreaking work on the hierarchical modeling of gene expres- 
sion data. We have shown how to improve on the original work by allowing 
for random gene effects and replacing gene-specific error variance point esti- 
mates without dramatically affecting computational costs. Adding random 
gene effects to a model increases the dimension of the parameter space across 
which the EM algorithm must optimize by one, but does not substantially 
increase computational costs. For any of the described methods, analyzing 
a data set with 5000 genes, 9 experimental units, and 4 expression patterns 
of interest takes less than 10 minutes using a single Linux machine and R 
code that calls a C routine to evaluate multivariate normal densities. We 
have developed the R package LN3GV (available at the CRAN webpage) 
to implement the LNNMV*, LNNGV, LN 3 , LN 3 MV*, and LN 3 GV meth- 
ods discussed in this article. Throughout this paper, the GG, LNN, and 
LNNMV methods were implemented via the EBarrays package. 

We have focused on the approach of Kendziorski et al. (2003) not only 
because of its influential nature but also because of its unique and elegant 
approach to inference for experiments with more than two treatments. The 
vast majority of competing approaches have been developed primarily for 
the case of two treatments. While it is easy to extend many of these methods 
to cover the case of more than two treatments, very few methods outside 
the Kendziorski et al. (2003) lineage provide an inherent natural strategy 
for classifying genes according to their pattern of expression across multiple 
treatments. Thus, we believe our efforts to improve the original work of 
Kendziorski et al. (2003) have been well spent. 

SUPPLEMENTARY MATERIAL 

Additional evidence supporting need for three-level hierarchy and simula- 
tion study details (DOI: 10. 1214/12- AOAS535SUPP; .pdf). The correlation 
across genes present in real microarray data makes directly testing the sta- 
tistical significance of gene effect variance estimates intractable. We present 
a simulation study that demonstrates the gene effect variance estimates ob- 
tained when analyzing the DC3000 and mouse diet data sets are drastically 
greater than those that arise when analyzing data simulated without gene 
effects. We also provide detailed accounts of simulation procedures and re- 
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suits used to evaluate the considered methods. These simulations clearly 
support our claims regarding the importance of distinctly modeling gene 
and gene-specific treatment effects and accounting for uncertainty in error 
variance estimators. 
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